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Abstract 



^>- I This paper introduces a simple but highly efficient ensemble for robust texture classification, 

r ■\ . which can effectively deal with translation, scale and changes of significant viewpoint problems. 

. ' The proposed method first inherits the spirit of spatial pyramid matching model (SPM), which is 

Y^ , popular for encoding spatial distribution of local features, but in a fiexible way, partitioning the 

original image into different levels and incorporating different overlapping patterns of each level. 

This flexible setup helps capture the informative features and produces sufficient local feature 

codes by some well-chosen aggregation statistics or pooling operations within each partitioned 

^ ' region, even when only a few sample images are available for training. Then each texture image 

is represented by several orderless feature codes and thereby all the training data form a reliable 

feature pond. Finally, to take full advantage of this feature pond, we develop a collaborative 

representation-based strategy with locality constraint (LC-CRC) for the flnal classification, and 

experimental results on three well-known public texture datasets demonstrate the proposed ap- 

C^ . proach is very competitive and even outperforms several state-of-the-art methods. Particularly, 

when only a few samples of each category are available for training, our approach still achieves 

very high classiflcation performance. 



^ ; 1 Introduction 



Texture is widely considered as a fundamental ingredient of the structure of natural images, and 
texture classification is an important problem in computer vision with many applications. Yet de- 
spite almost 50 years of research and development, designing a high-accuracy and robust texture 
classification system for real-world applications remains a challenge for at least three reasons: the 
wide range of various natural texture types; the presence of large intra-class variations in texture 
images, such as rotation, scale, viewpoint, and even non-rigid surface deformation, caused by arbi- 
trary viewing and illumination conditions; and the demands of low computational complexity and a 
desire to limit algorithm tuning [5]. 

There are four basic elements that constitute a reliable texture classification system, as Liu et 
aZ. point out in [3]: (1) local texture descriptors, (2) non-local statistical descriptors, (3) the de- 
sign of a distance/similarity measure, and (4) the choice of classifier. Thanks to the emergence 
of Bag- of- Feature words (BoF) model, which treats an image as a collection of unordered appear- 
ance descriptors extracted from local patches, quantizes them into discrete "visual words" and then 
computes a compact histogram representation for semantic image classification. As a result, recent 
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Figure 1: Several saniples of three categories from UMD texture database J4. From (a) and (b), 
we can see the statistic information within of the regions from various partition levels can capture 
multiple-scale feature information. As a result, local scale and translation differences can be ef- 
fectively alleviated, (c) presents different overlapping patterns in 4 x 4 partition. Here only the 
diagonal regions are plotted for better illustration. This overlapping partition strategy helps us 
capture reliable and redundant information of the textures. 



interest for texture classification tends to represent a texture non-locally by the distribution of local 
textons [H [3 [6l [2] , and achieves state-of-the-art performance. 

As an extension of BoF, spatial pyramid matching model (SPM) [7] has emerged as a popular frame- 
work to represent an image by extracting image descriptors such as SIFT [5] or HOG [51 on a dense 
grid, encoding them over a learned dictionary, and then summarizing the distribution of the codes 
in the cells of a spatial pyramid by some well-chosen aggregation statistics, or pooling operation. 
SPM paradigm has made a remarkable success on a range of image classification benchmarks, and 
becomes a major component of the state-of-the-art systems [10} [ITJ [Ijj. Inspired by SPM, we in- 
troduce a similar framework to SPM to partition an image into increasingly fine segments, but in 
a more flexible way, exploiting multi-level partitions with various overlapping patterns and thereby 
forming redundant local texture feature codes for each regions by a pooling operation. In this way, 
our method produces a reliable feature pond containing these informative feature codes, even when 
only a few samples of each class are available for training. 

To take full advantage of the feature pond, we develop a simple but effective and efficient mecha- 
nism for the final classification, called collaborative representation-based classification with locality 
constraint, LC-CRC for short, which is similar in appearance to sparse representation-based classifi- 
cation (SRC) ^13j, but essentially differs in two ways: (1) ^2-norm rcgularization is adopted in the 
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Figure 2: The flowchart of our proposed robust texture classification approach. 

least square fitting problem rather than ^i-norm penalty, and (2) locality constraint is employed to 
speed up classification process. 

We summarize the advantages of our texture classification system below: 

• Different from many state-of-the-art texture classification methods which combine several types 
of descriptors, our approach uses only a single type of feature descriptor, i.e. SIFT descrip- 
tor [8]. Thus, our method is much simple but still much capable of discriminating textures 
demonstrated in the experiment. 

• Benefiting from the flexible partition strategy, the proposed method can produce redundant 
feature codes to form a reliable feature pond, even though only a few samples of each category 
are available for training. 

• Instead of the widely-used SVMs, a simple but effective classification mechanism, LC-CRC , is 
developed in our method. It facilitates overall translation, scale, and viewpoint invariance in 
classification. And experimental results demonstrate the suggested LC-CRC is very effective 
and efficient. 

The rest of this paper is organized as follows. Section [2] gives a very brief review on the related work. 
Details of our framework are elaborated in Section [3l We show experimental results in Section |4] 
and conclude with some discussion in Section [5] before closing. 



2 Related Work 



We review closely related work on spatial pyramid matching (SPM), sparse representation-based 
classification paradigm (SRC), and local coordinate coding (LCC). 

SPM framework has made a remarkable success on a range of image classification benchmarks, and 
remains a major component of the state-of-the-art systems [Ml [71 [151 [HI [H]- In SPM, spatial 
order of local descriptors is seriously considered in image classification tasks, however, it is of little 
importance and captures no essential features in texture classification, because texture can be coined 
as " a subtle balance between repetition and innovation" |16j , or approximative reduplication. Despite 
of this, SPM indeed inspires us to borrow the idea of multi-level scheme (pyramid partition of the 
image) to design a local invariant framework for texture classification. This is one focus of our 
work — adopting a more flexible partition configuration with multiple overlapping patterns such 



as {3,4,5}!^, as Figure [T] shows, in place of the common fashion of SPM, e.g. {1,2,4} in [TU] and 
{1,2,4,8} in [12], . 

Despite the widely-used SVM classifier, sparse representation-based classification (SRC) |13| achieves 
a great success in face recognition task, and it boosts the research of sparsity based pattern classifica- 
tion. SRC solves a €i-norm penalized least square problem and identifies the class label individually, 
by checking class by class. However, £i-minimization makes SRC very computationally expen- 
sive, and recent research shows it lacks stability in face recognition [171 I18j . Furthermore, Zhang et 
al. point out the truth that SRC can produce interesting outcome is the contribution of collaborative 
representation 18 , and they propose a new ^2 regularized classification algorithm called collaborative 
representation-based classification with regularized least square (CRC-RLS), which adopts ^2-norm 
penalty rather than li regularizer in SRC. This modification leads to the simple ridge regression. 
However, when the number of collaborative data (training data) grows, calculating the coefficients by 
ridge regression becomes more computationally expensive, because a larger matrix inverse operation 
is involved. 

Luckily, in a parallel process, focusing on high-dimensional sparse coding problem, Yu et al. empirically 
observe that sparse coding results tend to be local - nonzero coefficients are often assigned to bases 
nearby the encoded data. And they theoretically point out that under certain assumptions locality 
is more essential than sparsity [H]. To make full use of this local relation, they suggest a modi- 
fication to sparse coding, called local coordinate coding (LCC), and based on their work, Wang et 
aZ. propose a practical method called locality- constraint linear coding (LLC) to fast implement LCC, 
and approximate LLC by utilizing K-nearest neighbors (KNN) ahead of time |20| . 

Combining the CRC-RLS and the idea of LLC, we develop a new classification mechanism, which 
adopts collaborative representation-based recipe regularized by £2 penalty and employs KNN search 
beforehand. Therefore, our classification approach is more stable to outliers as stated in j^lTI, and 
much more efficient because only small-size ridge regression is involved even when the number of 
training samples is large. 



3 The proposed Texture Classification Framework 

Focusing on the four basic elements of a reliable texture classification system, in this section, we 
introduce our proposed framework in detail: local texture descriptors, overall texture image rep- 
resentation, measurement and classification mechanism. The overall fiowchart of our approach is 
displayed by Figure [5] Notations used in this paper are embedded in Subsection 13.11 

3.1 Local Texture Descriptor 

In our work, we use a single type of feature descriptor, the popular SIFT descriptor [8], which 
is extracted on a dense grid rather than at interest points and has been shown to yield superior 
classification performance in [lOl [20l [HI [11] . Suppose there are T images from C classes and Ic 
denotes the index of c*'' class, and let i*'' image be represented by a set of dense SIFT descriptors 
xf ^ GR'^ {d = 128 for SIFT descriptor) at N locations identified with their indices i = 1, . . . , A^. 
M regions of interest are defined on the image with A/'m denoting the set of locations/indices within 
region m, and m € Ci means m*^ region belongs to Z*'' level, i.e. Ci indexes the regions in l*^ level. 
Then we use all the dense SIFT descriptors to train a dictionary D G R''^^, and employ the learned 
dictionary back to represent the dense SIFT descriptors into a sparse code vector, as the formulation 



^ It means an image is partitioned into 3x3 grid cells in the first level, 4x4 and 5x5 for the second and third 
level, respectively. Altogether 3x3-1-4x4-1-5x5 = 50 sub-images or regions or grid cells are formed. 



below: 

T N 



(af ,D) = arg min J] ^{||xf) - Daf H^ + A||af )||i} 

ar',Dt=ii=i (1) 

s.t. df di < 1 fori = !,...,£) . 

where a^ G M.^ is the corresponding sparse code vector. 

Each element Ofe of the code vector a indicates the local descriptor's response to the A;*'* visual 
word in the dictionary D. We align all the SIFT descriptors belonging to region m as a matrix 
X G ]^d,x\JVm\^ then the corresponding code matrix A G R^xl^ml is obtained. Here we aggregate 
the local descriptors' responses across all the |7Vm| locations of this region into an |A/'m|-dimensional 
response vector aj (the /c*'* row of A), in which each elements a^^ of a^ represents the response of 
the local descriptor x^ at the m*'' location to the A:*'* visual word. After obtaining all the feature 
descriptors A within a region, we can use a pooling operation to pool these feature descriptors into 
a single vector y of fixed dimension, described in Subsection 13.1.21 Before feature pooling, we first 
address the relevant partition issues. 

3.1.1 Partition Issues 

Different from classical and commonly used SPM scheme, which is three or four level pyramid 
comprising pooling regions of {1 x 1, 2 x 2, 4 x 4} or {1 x 1, 2 x 2, 4 x 4, 8 x 8} [H], we adopt a more 
flexible partition strategy and divide the original image into finer regions, e.g. {3 x 3,4 x 4, 5 x 5} 
as Figure [1] shows. 

Merely relying on this flexible partition fashion, through our observation, the proposed method can 
indeed capture local features in different scales and is resilient to local variance, such as translation, 
illumination and scale. But we go beyond by permitting different overlapping patterns at the same 
level of pyramid. Various overlapping patterns within a single level produce more regions when 
adopting the same partition pattern, e.g. a single level of 4 x 4 partition with 4 overlapping patterns 
will lead to 4 x 4 x 4 = 64 regions, as displayed by row (c) in Figure [TJ and accordingly 64 feature 
codes will be formed. More overlapping choices can produce more local texture features on multiple 
scales, and therefore these redundant local texture features can effectively alleviate the classification 
challenge caused by local variance. 

This way of partition with multiple overlapping patterns prevents the statistical information or 
pooled feature codes of local regions from becoming too rigid or too flappy for texture discrimination, 
and in conjunction with our proposed classification mechanism described in Subsection 13.31 it will 
lead to state-of-the-art performance of texture classification in the experiments. 

3.1.2 Feature Pooling 

Feature pooling is essentially to map the response vectors within each region into a statistic value 
/(a^) via some spatial pooling operation /. Among various pooling methods, such as average 
pooling, max pooling and some other pooling methods transiting from average to max [21^, max 
pooling is inspired by the mechanism of the complex cells in the primary visual cortex and has been 
shown a powerful operation empirically and theoretically [TOj [211 HH HI]- ^^ ^^i^ paper, we also 
adopt max pooling for its translation- invariance in different level of partitions [35] . 



After obtaining code matrix A of region m, we can pool the code vectors into one feature vector 



Ym G M^ to represent region m: 

y,„ (A) = [/(af ),..., /(aD,...,/(aDF 

r ^T (2) 

= max an, ... , max ak i, . . . , max a^ i\ 

Actually, no matter how the size of different regions differs, the pooled feature code is of the same 
dimension and well summarize the distribution of the SIFT feature descriptors in each region. This 
property enables us to adopt the flexible partition way and various overlapping patterns within the 
same level of partition, thereby producing redundant local texture features. 

3.2 Texture Image Descriptor 

As described in the previous subsection, we store all the pooled feature codes of one image to form a 
matrix Y = [yi, . . . , y™, . ■ . , Ym] as the new texture image representation. That is to say, regardless 
of region size and overlapping patterns, all the pooled feature vectors of regions are stored in an 
orderless way. This orderless storage, in conjunction with max pooling, enjoys translation and scale 
invariance. From Figure [1] it is not difficult to see the samples from the same class can represent 
one another by the statistic information that max pooling accumulates, which is local translation 
and scale invariant, therefore overall invariance property can be attained. We will see the benefit of 
this orderless storage from experiment in Section 31 

3.3 Measure and Classification 

Actually all the pooled feature vectors from regions of various levels of training images can be seen 
as redundant feature bases, or a feature pond, which can effectively represent pooled feature codes 
of a new image, and in this way, scale and translation invariance can be achieved. To fully take 
advantage of the benefit of orderless feature vector storage, we utilize a regularized least square 
(RLS) framework for the final classification. It is similar in appearance to sparse representation- 
based classification (SRC) [13], but essentially different. 

In SRC, a vectorized test image z is coded collaboratively over the dictionary of all T training sam- 
ples Y = [yi, . . . , yt, . . . , yx] under ^i-norm sparsity constraint, where yt is t*'* vectorized training 
sample. For simplicity SRC first calculate sparse coefficients by the formulation: 

a = argmm||z- Ya||^ + A||a||i (3) 

Then, SRC classifies test image z individually to determine which class z should belong to. In other 
words, it calculates reconstruction error r^ — |jz — Ycac||2 for all the C classes, where Yc is formed 
by the columns indexed by Xc and a^ is formed in the similar way. Finally it selects c = arg min r^ 

c 

as the predicted label. 

Although SRC has shown interesting results in face recognition and has been widely studied in the 
community, researchers recently have found that, in SRC, ^i-norm penalty in Equation [3] actually 
makes the classification framework unstable [HI [17], as well as computationally very expensive. 
Zhang et al. point out the truth that SRC improves face recognition accuracy is the use of collabora- 
tive representation, but not £i sparsity [18j . And they propose a collaborative representation-based 
classification framework with regularized least square (CRC-RLS) by solving a ridge regression for- 
mulation: 

a=argmin||z- Ya||^ + A||a||^, (4) 



Algorithm 1 Algorithm of LC-CRC 



Input: feature descriptor matrix of testing image Z = [zi, . . . ,zm] and feature pond formed by all the 
training samples Y — [Yi, . . . , Yt, . . . , Yt], where Yt — [yi, . . . , Ym], parameter K for KNN search and A 
for balancing ^2-norm penalty and least square fitting. 
Output: predicted label of the test image. 

1: Normalize the columns of Z and Y to have unit £2-norm length; 

2: for m = 1,2, ...,M do 

3: Use KNN within feature pond Y, selecting K neighbors of Zm to form matrix Y(k) G M^^^ with K 
indices Hm; 

4: Code Zm over Yj/^^j by 

a™ = (Y(jf)Y(A') + AI) Y(jf)Zm; 

5: Form AfT-dimensional vector a"* where elements of Hm locations are embedded with a™ and zeros 

elsewhere; 
6: end for 
7: Compute the reconstruction error for each class: 

L 

re = ^{ niin ||y„ - Yca^lh}; 

1 = 1 

8: Output the identity of test image Y as: 

identity(Y) — argmin(rc). 



Following the rest part of SRC, CRC-RLS achieves very competitive classification results but with 
significantly less complexity than SRC. 

However, when the number T of training samples grows, calculating the coefficients by ridge re- 
gression a — (Y-^Y + AI)~^Y^z becomes more computationally expensive, because inverse oper- 
ation on a larger matrix of size T x T is involved. To circumvent this problem, we borrow the 
idea of LLC [5D] described in Section [2] by applying KNN search among the feature pond before 
solving the ridge regression — choosing K nearest neighbors to form Y(^) € M.^^^ with indices 
'H(K)i and representing the testing image by solving a much lower-complexity ridge regression: 
a = (Y7^xY(;^:) -I- AI)^^Y?^nZ. After this, an overall coefficient vector a G MF is formed by em- 
bedding the elements of a G M.^ in 'H(/f ) locations of a and zeros elsewhere. The final classification 
follows SRC, and Algorithm [T] shows the whole classification algorithnjj. 



4 Experiment 

We evaluate the performance of the proposed texture classification framework on three public 
datasets: Brodatz dataset [23], KTH-TIPS dataset [24] and UMD texture database [1]. 

The Brodatz dataset is a well-known benchmark database for evaluating texture recognition algo- 
rithms. It contains 111 different texture classes. For each class, it is represented by only one sample, 
which is then divided into 9 sub-images non-overlappingly to form the database. Thus, there are 



^ Because of one texture image is represented by a descriptor matrix as Subsection l3.2l introduces. in the algorithm, 
each column of descriptor matrix should be treated individually, and the final reconstruction error is to accumulate 
over L columns (each column denote a pooled feature code of a specific region) — the summation of the smallest 
error of each level. Empirically, we find the using of smallest error for classification brings out better performance 
than that of the reconstructive errors of all the codes. 



999 images altogether with resolution of 215x215. Although this dataset lacks interclass variations, 
Lazebnik et al. point out that this database is a challenging platform for testing the discriminative 
power of texture descriptors, thanks to its variety of scales and geometric patterns 4 . The KTH- 
TIPS textures dataset contains ten texture classes. Images are captured at nine scales spanning two 
octaves (relative scale changes from 0.5 to 2), viewed under three different illumination directions 
and three different poses, thus giving a total of 9 images per scale, and 81 images per material 
class. Some sample images are shown in Figure [SJ and we can easily see the scaling and illumination 
changes increase the intra-class variability and makes this database especially difficult for classifica- 
tion task. UMD texture database is composed of 25 different texture classes, 40 samples for each, 
and all images are grayscale of 1280x960 pixels (1000 samples altogether). The textures are acquired 
under strong viewpoint and scale changes, arbitrary rotations, and significant contrast differences, 
even including textures with nonrigid deformation. Figure [1] displays some sample images from this 
database. 



4.1 Configurations and Implementation 

Dictionary learning and sparse coding. As a dictionary learning problem. Equation [T] is 
convex in a^- with fixed D and vice versa, but not convex simultaneously for both of them. The 
conventional way for such problem is to solve it iteratively by alternately optimizing over D or aj s 
while fixing the other. Fixing D, the optimization can be solved by optimizing over each coefficient 
a^ individually: 

min||xf)-Daf)||^ + A||af||i 

This is essentially a linear regression problem with i'l-norm regularization on the coefficients, i.e. 
Lasso in the Statistical literature. In our work, we solve this optimization by a very efficient algorithm 
called feature-sign search [25) . Fixing all the a^ s, the problem is reduced to a least square problem 
with quadratic constraints: 

min||X-DA||^, s.t. ||d,l|2 < 1 for i = 1, . . . ,£). 

The optimization can be done efficiently by the Lagrange dual as used in [25]. Throughout this 
paper, parameter A is set 0.1. In our experiment, the dictionaries learned contain 1500 visual words 
for Brodatz and KTH-TIPS dataset, and 3000 visual words for UMD dataset. 

Partition strategy and overlapping patterns. For the experiments, we partition all the texture 
images into 4 levels (2 x 2, 3 x 3, 4 x 4, 5 x 5) over Brodatz dataset, 3 levels (6 x 6, 7 x 7, 8 x 8) for 
KTH-TIPS dataset, and 4 levels (3 x 3, 4 x 4, 5 x 5, 6 x 6) for UMD texture database. Furthermore, 
over each partition level, we admit various overlapping patterns. Actually, we empirically find the 
partition strategy of each three datasets produces satisfactory results. 

LC-CRC framework for classification. In our proposed LC-CRC classification framework, 
there is a parameter A (different from the one of dictionary learning in Equation [1} to make the 
solution of the least square problem Equation |4] stable. Through empirical observations, we find 
that the experimental results are not sensitive to the choice of A if a small value is assigned which 
is less than 0.01, and thus we set A as 0.001 through out our work. Moreover, parameter K of KNN 
algorithm ought to be specified, and we set K = 100 when the number of training samples per class 
is small, e.g. only 1 or 2 samples of each class are available for training, and K = 300 when more 
training samples per class are available. 
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Figure 3: Classification rate vs. number of training samples on the three datasets. 

4.2 Brodatz dataset 

Figure|n](a) shows the classification results obtained on the Brodatz database. Following Lazebnik et 
al. [26] and Mellor et al. [27| , classification rates are estimated by averaging the results on randomly 
selected training sets, and 10 trials are performed in our experiments. 

SI+CtH indexing and AI+CtH indexing are two shape-based invariant texture features from the 
work of Xia et al. [28j . Here SI proposed by Xia et al. is a kind of feature that is invariant to 
(local) similarity transforms, and Al means (locally) afRne invariant features. Both of them are 
made of several histograms, such as scale ratio histogram, elongation histogram and compactness 
histogram. Moreover, CtH is contrast histogram computed by scanning all pixels of a local adaptive 
neighborhood, which is robust to geometrical distortions of the textures iMj. E)ue to that the 
samples in this dataset are created by cutting each texture of the Brodatz database into pieces, as 
a consequence, the resulting dataset lacks of viewpoint and scale changes. For this reason, Xia et 
al. also adopt a well chosen non-invariant indexing scheme (Non-invariant indexing in Figure [3] (a)) 
and it shows better classification result. Despite multiple histograms in ^28], our framework only 
employs one kind of feature descriptor (SIFT), and it achieves state-of-the-art performance. Note 
that when 3 samples per class arc used for training, our approach achieves 96.61%. This outcome 
is higher than 95.9% achieved by the method of Zhang et al. , based on the method of Lazebnik et 
al. [55] by employing three types of descriptors (SPIN, RIFT and SIFT) ^5^, and is comparable with 
97.16% (the highest classification rate on Brodatz dataset to the best of our knowledge) attained by 
the method of Liu et al. by using sorted random projections plus several kernel SVA-Is [3;. However, it 
is worth noticing that when only one images of each class is used for training, our approach achieves 
90% accuracy, which is significantly higher than the other methods. This verifies that our approach 
indeed can extract large amount reliable features of each type of textures, even when only a few 
sample images are available for training. 



4.3 KTH-TIPS texture database 



Following Zhang et al. [5] , we vary the number of training images and record classification accuracy, 
as Figure |3](b) shows. Note that all images are converted to grey scale in our approach and no use 
of color information is made whatsoever. Three methods are used for comparison, and the results of 
these methods are taken directly from the original publications or quoted from the recent comparative 
study of Zhang et al. [5] . In |26j , Lazebnik et al. first characterize the texture using Harris-affine 
corners and Laplacian-affine blobs with two descriptors (SPIN and RIFT), and employ nearest 




Figure 4: Performance on KTH-TIPS texture database, confusion matrix for classification of 10 
different textures. 10 images per class are randomly selected for training, and the rest for testing. 
The number at row R and column C is the proportion of R class which is classified as C class. For 
example, 9.86% of the linen images are misclassified as cotton class. The average accuracy is 94.23%. 

neighbor classifier. Their method achieves 91.3% accuracy when 41 samples of each class are used 
for training. Under the same configuration, the method of Zhang etal, introduced in Subsection 14. 21 
achieves 96.1%, and the approach of Liu et aZ. achieves 99.29% (the highest classification rate on 
KTH-TIPS dataset to the best of our knowledge) in [3]. And our method achieves 99.32% under the 
condition that 41 samples per material are used for training, which exceeds the best one (99.29%). 

It is worth noting that our approach achieves (94.1 ± 0.92)% when only 10 images of each class are 
randomly selected for training, which is significantly higher than the others. Figure |4] displays one 
confusion matrix under this condition. From the confusion matrix, we can see the misclassifications 
mainly concentrate on four materials: corduroy, cotton, linen and sandpaper. Figure [5] shows some 
samples of the four types of materials, and it can be easily seen that under different scale of different 
materials, they are very similar and this phenomenon results misclassification within these material 
types. 



4.4 UMD texture database 



The UMD texture database contains images of larger arbitrary rotation, larger scale variation and 
more significant viewpoint than the previous two datasets. Therefore, it is more challenging for 
classification. 

Figure[3](c) shows the classification rate vs. the number of training samples on UMD dataset. Xia's 
method denotes the SI+CtH indexing method as described in Subsection 14.21 in conjunction with 
geodesic distance, which considers textures as points lying on some intrinsic manifold and yields 
clear improvement in their method. Xu's method is based on a combination of wavelet transform 
and multifractal analysis. Liu's method is introduced previously in Subsection l4.2l We can see when 
only a few samples of each class are available for training, our method is comparable to Xia's method, 
which achieves the best performance on this database under small amount of training samples. While 
the number of training images of each class is increasing, Liu's method obtains better results. Still, 
Under this condition, our method achieves comparable outcome with Liu's method. When 20 sample 



10 







Figure 5: Similar texture pattern on KTH-TIPS database. Four different texture types are displayed 
here, it is easy to see that some images of the four textures are very similar with various scales. This 
phenomenon largely leads to misclassification on this dataset. 

images per class are randomly selected for training, 98.6% classification accuracy is achieved by Xu's 
method and 99.30% by Liu's method (the best one ever reported on this database). And our method 
achieves (99.32 ± 0.35)% classification accuracy, which is slightly better than Liu's result. 

From this experiment on UMD dataset, we can see our proposed texture classification approach can 
extract reliable texture features while only a few training sample images are available and leads to 
significantly better results. While the number of training sample grows, our method can still achieve 
state-of-the-art performance compared with other methods. In the consideration of the complex 
texture sample images from UMD dataset, it is easy to see that our method achieves invariance to 
local rotation variation, scale changes, translation, changes of illumination directions and significant 
viewpoint. 



5 Conclusion and Future Work 

In this paper, focusing on texture classification task, we introduce a novel and highly effective scheme 
for robust texture classification, which is invariant to scale differences, translation, significant view- 
point changes and local rotation. Inspired by SPM framework, we first develop a multi-level descrip- 
tor to describe local texture features, allowing different level of partitions and various overlapping 
patterns within each level of partition. From experiments, we see this flexible descriptor can better 
capture the local features of each kind of texture, and even when only a few samples of each class 



11 



are available for training, our method still achieves very high accuracy. Accordingly, we propose 
an efficient classification mechanism, which is based on collaborative representation with locality 
constraint, called LC-CRC. It first search relatively a few neighbors from the feature pond by KNN 
algorithm, and then use them to represent the target through solving a simple least square fitting 
problem with ^2-iiorm regularization. To evaluate our texture classification framework, we conduct 
several experiments on three well-known texture datasets and the outcome is very competitive and 
even outperforms several state-of-the-art methods. 

Actually, LC-CRC classification framework treat the feature pond as another dictionary, which is 
used to represent the pooled feature codes of testing images. This spirit of hierarchical sparse coding 
has been already explored by Yu et al. in |29] for object recognition, but there remains interesting 
extensions and confirmations, and one of our future work is to provide some insights of multi-layer 
dictionary learning for image classification. Moreover, our work provides a new application of SPM, 
and we expect some other applications based on SPM and its variants. 
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